Fatty liver classification via risk controlled neural networks trained on grouped ultrasound image data

Ultrasound imaging is a widely used technique for fatty liver diagnosis as it is practically affordable and can be quickly deployed by using suitable devices. When it is applied to a patient, multiple images of the targeted tissues are produced. We propose a machine learning model for fatty liver diagnosis from multiple ultrasound images. The machine learning model extracts features of the ultrasound images by using a pre-trained image encoder. It further produces a summary embedding on these features by using a graph neural network. The summary embedding is used as input for a classifier on fatty liver diagnosis. We train the machine learning model on a ultrasound image dataset collected by Taiwan Biobank. We also carry out risk control on the machine learning model using conformal prediction. Under the risk control procedure, the classifier can improve the results with high probabilistic guarantees.

where the attention weight w [l+1] sr is defined in terms of a softmax function.
Graph Isomorphism Network (GIN): The forward propagation of the representation of the node s from block l to block (l + 1) is given by (1 + ϵ)h [l]  s Multilayer Perceptron (MLP): The forward propagation of the representation of the 1 node s from block l to block (l + 1) is given by Note that in the Multilayer Perceptron, the graph structure is not used.
Global pooling layer: The forward propagation of the global pooling layer is given by ).

Appendix B
Below we describe the conformal prediction approach to controlling the risk of the machine learning models.Conformal prediction aims to estimate a set C(X test ) such that Below we only consider conformal prediction for the classification problem.Conformal prediction for classification is based on the following idea [9].It constructs the prediction set C(X test ) by finding a conformity score s(X, Y ) such that the relation Here the value q is a threshold and can be found by investigating the distribution of the conformity scores.In practice the distribution of the conformity scores can be approximated by the empirical distribution of the conformity scores computed from the calibration set.
In our paper we considered three methods for controlling risk of our machine learning model.The first method, Naive Prediction Set, sets the threshold value q = 1 − α, where α is the risk control level.The other two methods, Adaptive Prediction Sets [9] and Regularized Adaptive Prediction Sets [2], are based on conformal prediction.They set the threshold value q according to the empirical distribution of the conformity scores of the calibration set.In practice, if the risk control level is α, then q is defined as the (1 − α)(n cal + 1) /(n cal + 1)-quantile of the empirical distribution, where n cal is the sample size of the calibration set.
Now let σ k be the score of the kth diagnosis.Below we describe how the two methods compute the conformity scores from the calibration set: where is the ground truth label, and u ∼ Uniform(0, 1).

Appendix D
Further results of the two-class classification under the three risk control methods:

Table 7 :
Results of the average numbers of subjects of different classes under the three risk control methods at α = 0.1.n correct = the average number of subjects whose labels are correctly covered by the prediction sets.n = the average number of subjects.The average numbers were calculated based on 10 replicates.0 = normal; 1 = mild; 2 = moderate; 3 = severe.

Table 2 :
Results of the average numbers of subjects of the unambiguous group and am-

Table 3 :
Results of the average numbers of subjects of different classes under the three risk control methods at α = 0.1.n correct = the average number of subjects whose labels are Further results of the three-class classification under the three risk control methods: correctly covered by the prediction sets.n = the average number of subjects.The average numbers were calculated based on 10 replicates.0 = normal; 1 = mild, or moderate, or severe.

Table 4 :
Results of the average numbers of subjects of the unambiguous group and ambiguous group under the three risk control methods (α = 0.1).The average numbers were calculated based on 10 replicates.

Table 5 :
Results of the average numbers of subjects of different classes under the three risk control methods at α = 0.1.n correct = the average number of subjects whose labels are correctly covered by the prediction sets.n = the average number of subjects.The average

Table 6 :
Results of the average numbers of subjects of the unambiguous group and ambiguous group under the three risk control methods (α = 0.1).The average numbers were calculated based on 10 replicates.